Learning and Exploration in Autonomous Agents
نویسنده
چکیده
Place learning and exploration in autonomous agents is important for understanding and building intelligent agents. Experimental and computational studies from the areas of psychology, computer science and neuroscience have achieved great successes. This thesis provides a theoretical investigation into three major problems in place learning and exploration, namely, localization, mapping, and action selection. Two benchmark models are introduced to analyze the basic aspects of place learning and exploration. The checkerboard maze is a stochastic grid-type environment. Exploration performance of an agent is evaluated by the sensory prediction capability of the agent. The evaluation does not require the knowledge of the internal representation of the agent. Furthermore, the checkerboard maze is reduced to the classical multi-armed bandit model in order to analyze the action selection problem in details. The exploration performance in the multi-armed bandit is quantified by the estimation error of the reward means. Place learning and exploration is modelled as a Partially ObservableMarkovDecision Process (POMDP), and is implemented by a Bayesian network with internal dynamics. The map of the environment is represented by the observation probability of the POMDP, and is stored in the weights of a generative model. The belief state is tracked by Bayes filtering. The distribution of the sensory input is predicted by the generative model. Through the minimization of prediction errors by an on-line adaptive multiplicative gradient descent rule, the mapping between locations and sensory inputs is learned. In the n-armed bandit, the optimal exploration policy in the sense of total mean squared error is proved to be gain-maximization exploration. The sample complexity of the proposed ideal gain-maximization exploration policy can be O(n) as small as counter-based and errorbased policies, both in the sense of total mean squared error and expected 0/1 loss. For realistic situations where the reward variances are unknown, a realistic gain-maximization exploration policy is derived using upper confidence limits of the reward variances. Gain-maximization is a general principle unifying a wide range of exploration strategies including counter-based and error-based policies. By generalizing the total mean squared error, the counter-based and error-based exploration policies are shown to result from the gainmaximization principle with respect to different variants of the general objective measure. Formulating the exploration in reward maximization as the learning of the differences between the reward means, we derive gain-maximization selection policies both in the ideal case and for realistic situations. Through a simple linear trade-off, gain-based rewardmaximization policies achieve smaller regret on fixed data sets, as compared to classical strategies like interval estimation methods, ǫ-greedy strategy, and upper confidence bound policy. The action selection in the full place learning problem is implemented by a network maximizing the agent’s intrinsic rewards. Action values are learned in a similar way asQ-learning. Based on the results of local gain adaptation and multi-armed bandit,two gain functions are defined as the agent’s curiosity. Moreover, an estimation of a global exploration performance measure is found to model competence motivation. Active exploration directed by the proposed intrinsic reward functions not only outperforms random exploration, but also produces curiosity behavior which is observed in natural agents. Acknowledgments The results presented in this thesis were conducted during my work at the Institute for Theoretical Neurophysics in the University of Bremen. It is my honer and pleasure to work with my colleagues, who are so friendly and supportive to help me all the time. First of all, I would thank Klaus Pawelzik to open the exciting research field for me. He is always thought provoking, insightful, and guiding me to explore scientific questions. His generous support, kind encouragement, and warm-hearted friendship through all these years make my pursuit of science possible to be realized. I especially thank Prof. Wolfram Burgard, Prof. Helmut Schwegler, Prof. Ilja Rueckmann, Dr. Jan Nagler and Onno Boehler for being in my thesis committee. I am also grateful to Prof. Bornholdt and Prof. Jaeger for their insightful comments and discussions. I feel very thankful to Michael Herrmann for his constructive cooperation and discussions during these years. He is always ready to provide full strength help. I own many thanks to Agnes Janssen, who creates an amazingly friendly working atmosphere for us. I feel gratitude to Udo Ernst and David Rotermund, who maintain the computation facilities available anytime. I would thank my colleagues Matthias Bethge, Christian Eurich, Axel Etzold, Nadja Schinkel, Erich Schulzke, Juan Ochoa, Stefan Braunewell, Maria Davidich, Joerg Reichardt, Roland Rothenstein, Hedinn Steingrimsson, Andreas Thiel, Frank Emmert-Streib, Jens Otterpohl, Ronald Bormann, and Klaus Franke. Last but not least, I want to thank my whole family for their constant love, support and encouragement. I will never thank them enough.
منابع مشابه
Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory
Reinforcement learning addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goals. Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, we study the case ...
متن کاملAutonomous Exploration For Navigating In MDPs
While intrinsically motivated learning agents hold considerable promise to overcome limitations of more supervised learning systems, quantitative evaluation and theoretical analysis of such agents are difficult. We propose to consider a restricted setting for autonomous learning where systematic evaluation of learning performance is possible. In this setting the agent needs to learn to navigate...
متن کاملEfficient Exploration in Reinforcement Learning Based on Short-term Memory
Reinforcement learning addresses the question of how an autonomous agent that senses and acts in its environment can learn to choose optimal actions to achieve its goals. It is related to the problem of learning control strategies. In practice multiple situations are usually indistinguishable from immediate perceptual input. These multiple situations may require different responses from the age...
متن کاملActive Learning for Autonomous Intelligent Agents: Exploration, Curiosity, and Interaction
In this survey we present different approaches that allow an intelligent agent to explore autonomous its environment to gather information and learn multiple tasks. Different communities proposed different solutions, that are in many cases, similar and/or complementary. These solutions include active learning, exploration/exploitation, online-learning and social learning. The common aspect of a...
متن کاملA Multi-Agent Approach to Environment Exploration
Exploration is a central issue for autonomous agents which must carry out navigation tasks in environments of which a description is not known a priori. In our approach the environment is described, from a symbolic point of view, by means of a graph; clustering techniques allow for further levels of abstraction to be de ned, leading to a multi-layered representation. In this work we propose an ...
متن کاملA Shift into Autonomous Education
Fostering autonomous learning has become one of the key concerns of course designers and curriculum planners in the last 20 years which has been validated on both ideological and psychological grounds. However, estimating learners’ readiness to accept autonomous education is an important step prior to moving toward autonomous education. Thus, the current research investigated the patterns of au...
متن کامل